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Abstract 

In a previous paper, we showed how entanglement of formation can be defined as 
a minimum of the quantum conditional mutual information (a.k.a. quantum condi- 
tional information transmission). In classical information theory, the Arimoto-Blahut 
method is one of the preferred methods for calculating extrema of mutual informa- 
tion. In this paper, we present a new method, akin to the Arimoto-Blahut method, 
for calculating entanglement of formation. We also present several examples com- 
puted with a computer program called Causa Comun that implements the ideas of 
this paper. 



1 



1 Introduction 



This paper continues a series of papers JT|-[Q] investigating the connection between 
quantum entanglement and conditional information transmission (a.k.a conditional 
mutual information, abbreviated CMI). In the last paper of that series, we expressed 
the entanglement of formation as a minimum of the quantum CMI. Eureka! In clas- 
sical information theory, one of the preferred methods for numerically calculating 
extrema of mutual information is the Arimoto-Blahut algorithm j§]-[B|]. One wonders 
whether something akin to that algorithm can be used to calculate entanglement. 
After much huffing and puffing, we have found the answer to be yes. In this paper 
we present our first results. More specifically, we present a new algorithm that yields 
the entanglement of formation of any bi-partite density matrix and a corresponding 
optimum decomposition of that density matrix. Generalization of the algorithm to 
n-partite systems appears straightforward but we do not address it here. We also 
describe a C++ computer program called Causa Comun that implements the ideas 
of this paper. Finally, we present some examples computed with Causa Comun. 

Prior to us, as far as we know, only one group of researchers [f5]] has ever used a 
quantum version of the Arimoto-Blahut algorithm. They used it to calculate quantum 
channel capacities. 

There exist other excellent computer programs, written prior to ours, that 
can calculate various features of quantum entanglement. See Ref.|§ and |§. The 
software described in Ref.|| also calculates entanglement of formation and optimal 
decompositions, but it uses a conjugate gradient method that is very different from 
ours. 

2 Notation 

In this section, we will introduce certain notation which is used throughout the paper. 

Let Bool = {0, 1}. For any finite set S, let \S\ denote the number of elements 
in S. The Kronecker delta function 5(x, y) equals one if x = y and zero otherwise. We 
will often abbreviate S(x, y) by 8%. Sometimes we will replace an index by the symbol 
"•" . By this we mean that all values of the index are included. For example, if we are 
dealing with w a where a G {1, 2, ... n}, w, will represent the vector (wi, w 2 , ■ ■ ■ , w n ). 
For any Hilbert space H, dim(TC) will stand for the dimension of TC. If \ip) G TC, then 
we will often represent the projection operator \tp)(ifj\ by 7r(ip). 

We will underline random variables. For example, we might write P(x = x) 
or Px_{x) for the probability that the random variable x assumes value x. P(x = 
x) = Px_{x) will often be abbreviated by P(x) when no confusion is likely. S% will 
denote the set of values which the random variable x may assume, and N% will denote 
the number of elements in S%. With each random variable x, we will associate an 
orthonormal basis G S^} which we will call the x basis. TCx will represent the 

Hilbert space spanned by the x basis. For any \ipx) G Tlx, we will use ip x to represent 
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For any two random variables x and y, Sx, y will represent the direct product 
set Sx x S y = {(x, y)\x G Sx,y G S y }. Furthermore, T~ix,y will represent Ti x _®'H y , the 
tensor product of Hilbert spaces TCx and TC y . If \x) for all x is the x basis and \y) for 
all 1/ is the y basis, then 7^ is the vector space spanned by {\x, y)\x G Sx, y G S y }, 
where \x, y) — \x) \y). 

pd(S'^) will denote the set of all probability distributions Px for the random 
variable x; i.e., all functions Px '■ Sx — > [0,1] such that J2 x Px(x) = 1- dm(7i £ ) will 
denote the set of all density matrices p^ acting on the Hilbert space TCx] i.e., the set of 
all Nx dimensional Hermitian matrices with unit trace and non-negative eigenvalues. 

Whenever we use the word "ditto", as in "X (ditto, Y)", we mean that the 
statement is also true if X is replaced by Y. For example, if we say "A (ditto, X) is 
smaller than B (ditto, Y)", we mean "A is smaller than B" and 'X is smaller than 
Y". 

Given any function f(x) defined for all x G A, we define 

/0) f(x) 



ExeA numerator J2 x eA f(x) 

This is just a shorthand, useful when f(x) is a long expression, to avoid writing f(x) 
explicitly twice. 

This paper will also utilize certain notation associated with classical and quan- 
tum entropy. See Refs. [P-Oy , [ IT | for definitions and examples of such notation. In par- 



ticular, we will assume that the reader is familiar with the definition of the classical 
entropies H(x), H(x\y) (conditional entropy) and H(x : y) (mutual entropy) associ- 
ated with any P m G pd(S^ y ). We will also assume that the reader is familiar with 
the definitions of the quantum entropies S Pxy (x), S p (x\y) and S Pxy (x : y) associated 
with any px y G dm(W £y ). 

For Px, P' x G pd(5^), the classical Kullback-Leibler (KL) distance is defined by 

D{PJ/P'^ = £P(z) \og 2 [P(x)/P'(x)\ . (2) 

x 

For Px, p'x G dm(Hx), the quantum KL distance is defined by 

D (Px//Px) = tr £ |> £ (log a px - log 2 p'x)} ■ (3) 

The classical (and quantum) KL distance is always non-negative and equals zero iff its 
two arguments are equal. It has many other useful properties. For more information 
about the KL distance, see [11| for the classical case and [0 for the quantum one. 



When discussing classical physics (ditto, quantum physics), we will refer to 
various probability distributions (ditto, density matrices) which are "descendants" of 
(i.e., can be derived from) a parent Px y a G pd(Sx y a) (ditto, p" y G dm(7Y £2/ )). As an 
aid to the reader, here is a table mapping the classical descendants to their quantum 
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counterparts. The reader may find it helpful to continue returning to this "Cast of 
Characters" table as he advances through this play. 



Classical 


Quantum 


{P(x,y\a)\Vx,y} G pd(Sx V ) 


p%y G dm(Hx V ) 


{P(a)\Va} G pd(5a) 


{w a \ya} G pd(5a) 


P(x,y,a) = P(x,y\a)P(a) 


K a — w n a 

lv xy — UJ at J xv 

Pxya = Ea 1") ("1 K Zy 


P(x\a) = j: y P(x,y\a) 


n a — tr n a 
Px_ lL yPxy 


P(x,a) = Y^yPi.x.y^a) 


K*x_ — tTy_K£ y 


P(y\a) = E x P(x,y\a) 


Py = ^ X x_P%y 


P(y,Ot) = J2xP( X 'V: a ) 


K a — tr K a 

V ~ E £V 


P(x,y) = T, a p ( x >y> a ) 


Pxy = X^a Kxy 




K * K o> 
pa — E. 

xy ~ xi) a 



In the classical case, we will use a functional R{Pxya] of Pxya defined by 

R[P xya }(x,y,a) = = P{x \ a )P{y\a)P{a) . (4) 

P\a) 

We will often abbreviate R[Px V a] by R if no confusion is likely. Note that R(x, y, a) > 
and J2 x ,y,a R{x, y, a) = 1 so R G pd^^^). In the quantum case, we will use a 
functional R" y [K^ y ] of if" defined by 

K a K a 

Rt y [K y ] = ^^ = w aP lp a y . (5) 

We will often abbreviate R%y[K* ] by i?" if no confusion is likely. Note that R^ y /w a G 
dm(Hxy) for all a. 

Suppose p^y G dm(Hxy) has eigensystem \4>j))\Vj}- Thus, 

Pxy = ^2^j\<f>j)(<f>3\ ■ ( 6 ) 



According to Ref.|p.7|, p% y can be expressed as 

Pmt = J2 w a\^»)(^\ ' ( 7 ) 

a 

where w, G pd(S'a), and \if) a ) G TCxy for all a, if and only if there exists a transforma- 
tion (a G Sa, j G S^) which is "right unitary": 
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^3 "V 3 ' 

and which satisfies 



E TfTT = Sf , (8) 



E^Al^^v^W- ( 9 ) 

3 

Suppose A : H — > 7i is an operator with eigensystem {(Aj, |0j))|Vj}. Thus, 

^ = E^X^I- (io) 

The support (ditto, kernel) of A is the subspace of H consisting of the zero vector and 
all those vectors in 7i for which A does not (ditto, does) vanish. Suppose e is a very 
small positive number and X[o,e](x) for real x is an indicator function that equals 1 if 
x G [0, e] and vanishes otherwise. Then we define the projectors 7ik er (A) and 7i supp (A) 
by 

Kker(A) =EX[0 )6 ](Aj)|&)<<&| , (11) 
3 

^supp(A) = 1 - 7T feer (A) . (12) 

3 Classical Physics Minimization 

In this section, we will discuss a minimization of the CMI for classical probabilities. 
The CMI for Pxya can be expressed as 



/ ( P{ x )U\ a 

H(x:y\a) = P{x,y, a) log 2 

x,y,a \ 

= p (x,y,a)\og 2 ' 



P(x\a)P(y\a) t 
P(x, y, a)P(a) 

ryx, y,a)LO£ 2 \ 

x,y,a 



P(x,a)P(y,a) 



X p i™°^{t&- (13) 



x,y,a 



Let Vda be the set of those Pxya £ pd(Sxya) for which the sum over a of 
P(x, y, a) equals a fixed P^ y G pd(S , ^ 2/ ): 

^cZa {^xya G pd(jS^y^) | i^jy -^xj/} ■ (1^) 

We define the entanglement E da by 

^cia = (l) p mm if (x : y\a) . (15) 
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In this definition of E c i a , is assumed to be fixed . Clearly, for large enough, 
E c ia — 0- Indeed, suppose N a = N x N y and a — > (x a ,y a ) is a 1-1 onto function from 

Sa to Sxy. Then H(x : y\a) = for pjx, y, a) = P(x a , y a )5 x x a 5l a . 

It is convenient to consider the following "Lagrangian" functional of two prob- 
ability distributions: 

C(P x _ m , P' xya ) = E P(*, v, «) ^ f , (is) 

where it!' = R[P xya \- E da can be defined in terms of this Lagrangian by 

^^^l^^'W- (17) 

Lemma 3.1 C(Pxya,P xya ) is convex (U) in its first argument. 
proof: 

The proof is very similar to the proof that the classical entropy H(P) is concave 
(fl) in P. Let f(x) = xlnx. If we can show that AC(P) = J2 x ,y,a f{P{ x -> Vi a )) i s 
convex in P, we will be done, because the remaining part C — AC is linear in P. Let 
A e [0, 1], A = 1 - A, and P — AP« + AP( 2 \ where P«, P^ E pd(S xya ). Since f(x) 
is convex in x, 



x,y,a 

< A ]T / (P (1) {x, y,a))+\J2f (P (2) {x, y, a)) 

x,y,a x,y,a 

= AA£(P (1) ) + AA£(P (2) ) . (18) 

QED 

(On the other hand, C(P, P') is not generally convex or concave in its second 
argument. This can be seen by taking the second derivative of C with respect to that 
argument.) 

Theorem 3.1 Let 

V'cla = • (19) 

At fixed P xy a e V da , £(P xy a, P xyg ) is minimized over all P' xyg _ e V' da iff P' xyg _ satisfies 

PL = Pa, (20a) 

PL = Pxa , (20b) 
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and 



P'ya = Pm ■ (20C) 



Thus, 



and 



£(Pxya> Pxya) ~ m *5, 7 C(Pxya, Pxya) ) (21) 



pi p-p- 

cla 



E <*« = [ ott ) r ™ , • ( 22 ) 



2 In 2 / P mz ev c i a P'^v' cla 
proof: 

The Lagrangian £ can be expressed in terms of the KL distance as follows: 

f d{p«//pQ 

J_r(P p' \-l +^P(a)D(P(x,y\a)//P(x\a)P(y\a)) . , 

In 2 1 ^ | + J2 a P(a)D(P(x\a)//P'(x\a)) " 1 6) 
{ +E a P(a)D(P(y\a)//P\y\a)) 

The KL distance is always non-negative and equals zero iff its two arguments are 
equal. Hence, Eqs. fl2"0|) are necessary and sufficient conditions for C(P,P') to have a 
global minimum in P' at fixed P. QED 

Theorem 3.2 At fixed P' xya E V'da, £(Pxya, P' X ya) ^ s minimized over all P^ m E V c i a 
iff Pxya satisfies 

P(a\x,y) = R'(a\x,y) (24) 

for all x,y,a. 
proof: 

Suppose a minimum is achieved. Define a new Lagrangian C tot by adding to 
C a Lagrange multiplier term that enforces the constraint that the sum over a of 
P(x,y,a) equals a fixed probability distribution P(x,y) E pd(Sx y )' 

Ctot = C(P*ya, Pxya) + £ H*> V) E P ^ ^ <*) " ^ 3/)] ■ ( 2 5) 

x,y a 

Ctot should not change if we vary infinitesimally and independently the quantities 
X(x, y) and P(x, y, a) for all x, y, a. Thus, 



dC tot 



dP(x Q , y a , a a ) 



In P(x , y a , a ) + l- In R'(x , y , a Q ) + X(x , y Q ) . (26) 
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Let 



A(x,y) = -1-X(x,y). (27) 

Then 

In P(x, y, a) = In R'(x, y, a) + A(x, y) . (28) 

Taking the exponential of both sides of the last equation and summing them over a 
yields the following constraint on A(x,y): 

P{x,y) =]T exp[ln#(x, y,a) + A{x,y)\ . (29) 

a 

Solving the last equation for A(x, y) yields: 

Eq . (p4[) now follows from Eqs. (p8|) and (p0|). 

£(P,P') has an extremum in P at fixed P' iff Eq.(p4|) is true. Furthermore, 
since £ is convex in its first argument, the extremum must be a global minimum. 
QED 

Theorem 3.3 The CMI minimum which defines E c i a is achieved iff 



P(x,y,a) = P(x,y) x R ^; V ' a \ . (31) 



Furthermore, 



where 



E «° = I 2to2 J <A> ' (32) 



A(x,,) = -ln(^4) , (33) 



P(x,y) 



and 



(A) = J2P(x,y)A(x,y). (34) 

x,y 

This justifies calling A an "entanglement operator". 
proof: 

The first part of this claim just brings together results obtained in the previous 
two theorems. The second part where E c \ a is expressed in terms of A follows from: 
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1 „.m„ 



Eda = {^) L P{x ' v ' a)la 



' ' £P(x,</,a)A(x,<,). (35) 

x,y,a 



2 In 2 



QED 

The last theorem gives certain conditions obeyed by any Pxya which achieves 

E da . Next we will define a sequence of probability distributions, Pj™ for n = 0, 1, 

The sequence will converge to Pxya as n — > oo. We will define our sequence recursively. 
In the following diagram, each quantity is defined in terms of the quantities that point 
to it. 



p(o) _> p(i) _> p(2) 



(36) 



Let P^ya be chosen arbitrarily from V c i a . For any n > 0, let 



p(n+l). 



P(x,y) 



(37) 



P[P/ n ) 



where P^ n -* 

In this paper, we won't prove that the sequence of P^ a converges. We defer 
that to future papers, confining ourselves here to presenting some empirical and intu- 
itive motivations for the sequence. In Section |6|, we give some computer results that 
are good empirical evidence of convergence. Note that if the limit of the sequence 
does exist, then the limit of Eq . (|37|) is Eq . (|3lD . 



4 Quantum Physics, Mixed Minimization 

In this section, we will discuss a quantum counterpart of the classical minimization 
problem discussed in the previous section. 

Consider all pxya £ dm(?i^„) of the special form: 

Pxya = \ a )( a \ W *Pxy = X! l«) ( a \ K xy > ( 38 ) 
a a 

where \a) for all a is an orthonormal basis of Ha, w. E pd(5* a ), and p" y G dm(7i^). 
As shown in Ref.g], the CMI for 

Pxya can be expressed as 



S p ^(x:y\a) = ^w Q 5 p| ^(x:y) 

a 
a 
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and 



Let 



t r x,J 



K-(log 2 ^-log 2 



K a K a 

my • 



£t % ,, [i^(log 2 iC y -log 2 i^ : 



(39) 



^Lxed = KJVa,iC = w a p a w. G pd(Sy,p" G dm(W a/ )} 



(40) 



k-mixed = {K* G lC mixed \ K xy = Pxy} ■ (41) 



Q 



We define the entanglement E mixed by 



E m ixed = ( ^ ) min S p ^(x : y\a) . (42) 

£ / xy ^'^■mixed 



In this definition of E mixe d, Na will be assumed to tend to infinity. Ref.[[L4]] shows 
that the limit is reached at a finite N a < (N x N y ) 2 . 

It is convenient to consider the following "Lagrangian" functional of two den- 
sity matrices: 

C(K« y , K'«) = £ t % , [i^(ln K a xy - In , (43) 

where R'® y = R° [K'^]. E mixed can be defined in terms of this Lagrangian by 

E mixed = ( ^) _ min £(K° y , Kg) . (44) 



2 111 2/ xy^^mixed 



Lemma 4.1 C(K xy , K'g) is convex (U) in its first argument. 
proof: 

The proof is very similar to the proof that the quantum entropy S(p) is con- 
cave (fl) in p (see Ref . [|T~2]| ) . Let f(x) = x\nx. If we can show that AC(K xy ) = 
J2 a foxy f (Kxy) * s convex in K xy , we will be done, because the remaining part C — AC 
is linear in Kg. Let A G [0, if, A = 1 - A, and K a = \K^ a + \K^ a . Here K^ a 

and K^ a belong to dm(TCxy) if we normalize them by dividing them by their trace. 
Let K a have eigensystem {(mf, |</>?))|Vj}. Since f(x) is convex in x, 
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AC{K C 



= E/W) = E/Wl^i» 

a,j a,j 

< A £ /( WIA-W-I^)) + A £ /((0J|K (2) 10J)) 

< AE(0il/(^ (1)Q )l0i> + AE(^I/(^ (2)Q )I^) 
= AA£(if (1)a ) + AA£(if (2)a ) . 



(45) 



QED 

Theorem 4.1 Let 



mixed 



K 



mixed ' 



(46) 

At fixed K' m G Ximixed, £{K%y, K'* y ) is minimized over all K'' y G K,' mixed iff K'* y 
satisfies 



w„, = w r 



(47a) 



and 



ll x XY x ' 



(47b) 



Thus, 



and 



proof: 



K a = K a . 

y y 



K*.£jC rnixed 



E ■ 

^ nil 



mixed 



min min C(K° ' K'") 



2 In 2 / K£ y eK mixe d K'*eK 



(47c) 



(48) 



(49) 



The Lagrangian £ can be expressed in terms of the KL distance as follows: 



In 2 



f D(wJ/w' a ) 
+ E a w a D(p1// P: _ 



(50) 



Hence, Eqs.(fl"T|) are necessary and sufficient conditions for £(K, K') to have a global 
minimum in K' at fixed K. QED 
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Theorem 4.2 At fixed K' y E K, 'mixed, £>{K^ y , K3) is minimized over all K* e 
iff K^ y satisfies 

lni^ = lni^ + A^, (51a) 

and 

p^ = ^exp(ln^ + A^). (51b) 



proof: 

Suppose a minimum is achieved. Define a new Lagrangian C tot by adding to 
d> a La grange multiplier term that enforces the constraint that the sum over a of K£ y 
equals a fixed density matrix p xy G dm(7-^ y ): 

C tot = £(K« y , K'«) + tTaglA^E K y ~ Pxy)} ■ (52) 

a 

Ctot should not change if we vary infinitesimally and independently the operators X xy 
and K£ y for all a. Thus, 

= SCtot — ^ ^ T x,y 
a 

Let 



^xy\ 



(53) 



A xy = "I - Xxy ■ (54) 

Then 

ln^ = ln<«+A ££ . (55) 

Taking the exponential of both sides of the last equation and summing them over a 
yields the following constraint on A^ y : 

Pxy = J2 eX P t ln R xy + A xy] ■ (56) 



C(K, K') has an extremum in K at fixed K' iff Eqs.flBTD are true. Furthermore, 
since C is convex in its first argument, the extremum must be a global minimum. QED 

Theorem 4.3 The CMI minimum which defines E mixe ^ is achieved iff 

lnjqj^lni^ + Agn (57a) 

and 
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p, £ = ]Texp[mi^ + A^]. (57b) 

a 

Furthermore, 

E mixed = (A) , (58) 

where 

(A) = tT^y(pxyAxy) ■ (59) 

This justifies calling A xy an "entanglement operator" . 
proof: 

The first part of this claim just brings together results obtained in the previous 
two theorems. The second part where E m i xe d is expressed in terms of A follows from: 



E mixed = (— J£tr^[l^(lnAJ-liii^)] 
1 



21n2 



tox,y{PxyAxy) ■ (60) 



QED 

The last theorem gives certain conditions obeyed by any pair (K£ y , A xy ) which 
achieves E m i xe d. In the classical mixed minimization problem, we were able to solve 
for A explicitly and substitute it into the remaining equations. Non-commutativity 
now prevents us from doing this. The way we will overcome the obstacle of non- 
commutativity is to solve for both K" y and A xy simultaneously. Next, we will define 

two sequences of operators, and for n = 0,1,2.... The sequences will 

converge to K£ y and A xy , respectively, as n — ► 00. We will define our two sequences 
recursively. In the following diagram, each quantity is defined in terms of the quan- 
tities that point to it.|l3| 

K a (°) -> K a W -> K a ^ ■ ■ ■ 

xy xy xy 

/ T / I (61) 

A(°) -> A« -> A( 2 ) ••■ 
xy ±y ±y 

Roughly speaking, our strategy is: estimate K£ y , use the latter to get a better 

estimate of A xy , use the latter to get a better estimate of K%L , use the latter to get a 

better estimate of A xyi and so on. Let K^^> be chosen arbitrarily from K, m ixed- Let 

Ag> = 0. For any n > 0, let 



7Ti CXp 

K a(n+1) 



InR^ + A^ 



-- J2a^ T xy(numerator) 



— , (62a) 
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(now use this K to produce an even better K, which we call K tilde:) 



£>a(n+l) 
2±y 



Tii exp 



In -I- A( n ) 

xy 1 xy 



7Ti 



J2 a ^ T xy (numerator) 



7Ti 



An) ■ 



A (n+1) 

33/ 



In e ^^/ (n+1) 



e 2 



(62b) 
(62c) 
(62d) 



■_,„. -• i , (L - - £ „ j, TO = ^ker(Pxy) and 7r X = 1 - 7T . 



where i?^ 

In this paper, we won't prove that the sequences and converge. 

We defer that to future papers, confining ourselves here to presenting some empirical 
and intuitive motivations for the sequences. In Section |], we give some computer 
results that are good evidence of convergence. It is easy to see that if the sequences 
do converge, then their limit satisfies Eqs. (|57|) . Indeed, as n — > oo, Eqs.(|62"D become 



K%y = tti exp 



lni^ + A 



7Ti . 



(63a) 



P-av = ti exp [In R% y + A^i 



(63b) 



Eq. ( |63a| ) arises from combining the limits of Eqs.( |62aD Eqs.( |62b| ). Eq. ( |63b| ) arises 
from combining the limits of Eqs. (|62cj ) and (|62d| ). Note also that when all operators 
are diagonal and therefore commute, Eqs. ( |62c| ) and ( |62d| ) give Eq.([33]), the definition 
of the classical A. If we set 7ri = 1 and ttq = for now, then Eq. (|63a|) is the same as 
Eq.([57a|). And Eq.(|63b|) is the same as Eq.(|5 



Now let us explain the purpose of the 7r operators. Ideally, we would want to 
define J (n+1) by 



j(n+l) 



:P 



(n+l). 



(64) 



where 



£>a(n+l) 



(65) 



However, if p% y has any zero eigenvalues, its inverse square root does not exist. Note 
p(n+i) tends to pxy For some small positive real e, we can define 



/pxy 



Jn+1)_ 



IPxtj 



(TTxp^TTx + eTTo) 



evr 



1 



Pxy + CTTO 



(66) 
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tti-^tt! I P (n+1) [ TTl-^TTx 1 + vro . (07) 



(The eir summands come into play only when p^ y vanishes.) This justifies Eq. (|62c|) . 
Pxy = J2 a K" y so we mus t also require that K° vanish over the kernel space of p^ y . 
We force this to happen by pre and post multiplying the right hand side of Eq.( |63a| ) 
by 7Ti. 

Another potential source of singular behavior in Eqs.(p^) is the function exp(ln R+ 
D) where R, D are Hermitian matrices and R can be singular. This is not a theoreti- 
cal disaster because even though the log of a zero eigenvalue of R gives minus infinity, 
upon taking the exponential of that minus infinity, we get a zero contribution. From 
a numerical point of view, calculating exp(lni? + D) accurately when R is singular 
poses a challenge. Our first impulse is to calculate the eigenvalue expansion of R, take 
the log of the eigenvalues of the latter expansion, add D to the result, calculate the 
eigenvalue expansion of In R+D, exponentiate the eigenvalues of the latter expansion. 
Finding the eigensystem of In R + D can be hard to do accurately when R is nearly 
singular and therefore In R + D contains some nearly infinite eigenvalues. There are, 
however, other ways of exponentiating a matrix which do not require calculating its 
eigensystem. Ref. [15| describes 19 "dubious" ways of exponentiating a matrix. Its 



authors use the adjective dubious because none of these methods is ideal. Some work 
only for certain types of matrices, others entail an excessive number of operations, 
others are too sensitive, etc. In our case, whenever we use exp(ln R + D), the matrix 
In R + D is expected to be Hermitian with non-positive eigenvalues. Method 4 of 
Ref-ITo] fits this situation perfectly. The method, first proposed by Colby et al in 



Ref. ||16||, is to approximate exp(— A) by a ratio of two n'th degree polynomials in A. 
The method works well even if some of the eigenvalues of A are nearly infinite, as 
long as they are all non-negative. 



5 Quantum Physics, Pure Minimization 

In this section, we will discuss another quantum minimization problem. This min- 
imization will differ from the quantum mixed minimization discussed previously in 
that now the range of our minimization will be restricted to those e dm(7i^ y ) of 
the special form: 

Ply = (68) 

where \i[> a ) G Hxy. 

As in the quantum mixed minimization problem, the CMI for Pxya can be 
expressed as 
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S Pmz (x:y\a) = J2 w <* S PiM : U) 

a 

= Y.MS(pl) + s( P a y )-s(p« y )} 

a 

= £ fe ( lQ g2 Ky ~ l0g 2 Ky)] ■ (69) 



Let 



Kure = {K y \Va,K« y = w a \i/> a )(i/> a \,w. e pd(Sy, G , (70) 

and 

K<pure — {Kxy G ^- pm - e | £ ^aj/ = P^j/} ■ (^-Q 
— a — 

We define the entanglement E pure by 

E pU re = (l) min S^fe : y|a) . (72) 



In this definition of E pure , will be assumed to tend to infinity. Ref. ||14|| shows that 
the limit is reached at a finite < (N^Ny) 2 . 

Note that since we are now assuming that the p" y are pure for all a, 

S Piy (x:y) = S(pl)+S(p^-S(p^ 

= lS{pl). (73) 

Note also that 

p^ = tx y _\^ a )(^ a \ . (74) 

If e = {(w a , \ip a ))\a G So} where p m = J2 a u, a|V'a:}('0a|> we ca U e a px y ensemble or 
preparation. Thus, our definition of E pure can be re-expressed as 

= mm^2w a S(tr y \ijj a }(ip a \) , (75) 

a 

where the minimum is taken over all p^ y ensembles e. This is precisely the definition 



usually given for the entanglement of formation] 18 1. Thus, E pure is identical to the 
entanglement of formation. 

It is convenient to consider the following "Lagrangian" functional of two den- 
sity matrices: 
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C(K^, K'«) = J2 tr^ [K^Qn K« y - In R'«)} , (76) 

a 

where R'° = R" y [K' x ^j . E pure can be defined in terms of this Lagrangian by 

Lemma 5.1 C(K^ y , K'^) is convex (\J) in its first argument. 
proof: 

See proof of analogous lemma in the section on quantum mixed minimization. 

QED 

Theorem 5.1 Let 

pure = Impure ■ (^8) 

At fixed K' y G ICp Ure , C(K% y , K'^ y ) is minimized over all K'* G IC' pure iff K'^ satisfies 

(79a) 



and 



Thus, 



and 



k': = k: , (79b) 



K a = K . (79c) 



C(K« y ,K« y ) = mm C(K« y , K'«) , (80) 

Kj^jtz/C pure 



B - = (2S2) ,<^J (K ^ ' (81) 

proof: 

See proof of analogous theorem in the section on quantum mixed minimization. 

QED 

Theorem 5.2 At fixed K'* y G IC' pure , C{K^ y) K'^) is minimized over all K* y G IC pure 
iff K xy = Wa\ipa) {ipa I satisfies 

\n(w a ) I Va> = (In R% + A.xy) |^> , (82a) 



17 



and 



'£!-! 



W a \ll>a)(l/>a\ 



(82b) 



proof: 



Let 



\n a ) = y/w^\ip a ) 



and 



A a = \nK« y -\nR« 



Then the Lagrangian C can be expressed as: 

C{K^,K'«) = 52tT m (\n a )(n a \A«) ■ 



54) 



55) 



Suppose a minimum is achieved. Define a new Lagrangian £ to t by adding to £ a 
Lagrange multiplier term that enforces the constraint that the sum over a of \n a )(n a \ 
equals a fixed density matrix p m e dm(7i £y ): 



Aot = K^) + tr £il ,[A^,(^ |n a )(n a | - p^j,)] 



(86) 



should not change if we vary infinitesimally and independently the operators X xy 
and \n a )(n a \ for all a. Thus, 



= 5C tot = J2 tr ^y s (\ n a)(n a \)(A a + 1 + A E 



57) 



Suppose we have an arbitrary Hermitian operator A acting on some Hilbert space 7i 
and |n) EH. If 



then 



so 



Let 



= tr [5{\n){n\)A] 



= (n\AS(\n)) + (6{n\)A\n) 



A\n) = . 



(88) 



59) 



(90) 



(91) 



Then Eq.(87) implies that 
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(lnJ^-lni<J-Ag,)|Vi a > = 0. (92) 

(Assume w a 7^ for all a). Suppose w G (0, 1]. Given any column vector \ip) G H, 
we can always find a unitary matrix [/ such that = U\0), where |0) is the unit 
vector which has one as its first component and zero for all others. Thus 



ln{w\^){^\)\^) = Uln(w\0){0\)WU\0) 



UdiagQnw, — oo, — oo, 



1 





ln(w)|-0) 



Thus, Eq.(^) implies that 



(93) 



(94) 



\nw a \ip a ) = (\nR«y + Axy)\ij a ) . 

C(K, K') has an extremum in K at fixed K' iff Eqs. (|32"D are true. Furthermore, 
since C is convex in its first argument, the extremum must be a global minimum. QED 



Theorem 5.3 The CMI minimum which defines E puTe is achieved iff 



and 



Furthermore, 



where 



ln(0|^a) = (hliC + ^xy)\^a) 



Pxy = ^2Wa\lpa)(4>a\ ■ 



E, 



pure 



21n2 



(A) 



(95a) 



(95b) 



(96) 



(A) — ^x,y(Pxy^-xy) ■ 

This justifies calling A xy an "entanglement operator". 



(97) 



proof: 

The first part of this claim just brings together results obtained in the previous 
two theorems. For a proof of the second part where E pure is expressed in terms of A, 
see the analogous theorem for quantum mixed minimization. QED 

As in the quantum mixed minimization problem, we will define two sequences 
of operators, = and A x n J for n = 0, 1, 2 The sequences will 
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converge to K° and A m , respectively, as n —>■ oo. We will define our two sequences 



recursively. Let K"^ be chosen arbitrarily from IC pure . Let = 0. For any n > 



let 



w ( n+1) |^( n+1)) = ^ lni? «(«) + A (n)j ^("+1)) 



xy 



A" 



a (n + l) = ^ n+1) |^" +1) )(^ ra+1) | 

So tr^ {numerator) 



~- 2 a foxy (numerator) 



v(") 



,(») ' 



A(n+1) 

mi 



In e^/<" +1 ) 



e 2 



(98a) 

(98b) 
(98c) 

(98d) 
(98e) 
(98f) 



where Rffl 



^[KgW], 7r = HkeriPxy) and 7Ti = 1 - 7T . In Eqs.(|9~8aD and fl98cD , 
we choose the largest eigenvalue and the corresponding eigenvector of the operator 
on the right hand side of the equation. As n tends to infinity, said operator tends 
towards a projection operator, so its eigenvalues all go to zero except for possibly one 
of them. 

In this paper, we won't prove that the sequences if^"' and Aj™ converge. 
We defer that to future papers, confining ourselves here to presenting some empirical 
and intuitive motivations for the sequences. In Section |^, we give some computer 
results that are good evidence of convergence. It is easy to see that if the sequences 
do converge, then their limit satisfies Eqs.fl95|). 



6 Causa Comun 

We have written a C++ program called Causa Comun ( "Common Cause" in Spanish). 
It's called this because entanglement is a manifestation of causality; it occurs between 
several events with a common cause. Causa Comun can do all three minimizations 
considered in this paper (classical, quantum mixed and quantum pure). For each 
of these three cases, it can find the entanglement, entanglement operator, and an 
optimal state decomposition. Next, we will discuss Causa Comun output for two 
examples of quantum states: Bell Mixtures, and Horodecki States. 
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6.1 Bell Mixtures 



In this example 

S & = Sy = Bool . (99) 
The following four states are usually called the "Bell basis" of TC xy : 

|^ ± > = l^> = ^(|01>±|10», (100) 

and 

|0 ± ) = |= ± ) = i=(|OO)±|ll)). (101) 

Let \B(p)) with fi G {0,1,2,3} represent the four Bell states. Call a "Bell 
mixture" any density matrix p xy expressible in the form 

p-^E^vlWXWh (102) 

where m. G pd(S' /x ). For any p G [0, 1], define the binary entropy h(p) by 

h(p) = -[plog 2 (p) + (1 -p)bg 2 (l -p)\ . (103) 

Ref-UU showed that for any Bell mixture px y , the entanglement of formation is given 
by: 



Eform.(pxy) = h ~J ' (104a) 

JO ^ ^Tlmax *^ 2 f 104b) 

1 (2m max — l) 2 otherwise 

where m max refers to the maximum of the weights m^. 

We will refer to the following one parameter family of Bell mixtures as the 
"Werner States": 



W(F) = F\ =+><=+ | + ^-L (| =-)(=- | + | ^}(^ | + | jr)ljr |) , (105) 
for F G [0,1]. 

Fig.|I| is Causa Comun output for E pure and E mixed of the Werner States using 
Ng_ = 6. We also got E mixe d = E pure = for < F < \ (not shown in graph). We 
compared E pure obtained using our algorithm and Ef orm obtained using Eq. (|104j) , and 
found them to agree well over the entire range F G [0, 1]. Note that E mixei i behaves 
like the entanglement of distillation: both are non-negative, less than or equal to 
Eform, and equal to Ef orm for pure density matrices. In a future paper, we will 
explain the close connection between E mixe( i and entanglement of distillation. 
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Figure 1: Pure and mixed entanglements for Werner States. 



6.2 Horodecki States 

In this example, 



S* = ^ = {0,1,2}. 



Let 



s xy 

^(S 01 + 5 12 + 5 20 ) 

g \ u xy ' u xy ' u xy) i 
Sjx'y' 

g \ w xy ' xy 1 w xyJ 



(106) 

(107) 
(108) 
(109) 



We will refer to the following one parameter family of density matrices as the "Horodecki 
States" : 



(110) 



where a G [2,5]. These states where first introduced in Ref.([|19j), where it was 
shown that they are separable for a G (2, 3), bound entangled for a G (3, 4), and free 
entangled for a G (4, 5). 
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Figure 2: Pure and mixed entanglements for Horodecki States. 

Fig.|] is Causa Comun output for E pure and E mixe d of the Horodecki States 
using Na = 12. We also got E mixed = E pure = for 2 < a < 3 (not shown in 
graph). 
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